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Abstract 

The classical 'lock-and-key' and 'induced-fit' mechanisms for binding both originated in attempts to 
explain features of enzyme catalysis. For both of these mechanisms and for their recent refinements, 
enzyme catalysis requires exquisite spatial and electronic complementarity between the substrate and 
the catalyst. Thus, binding models derived from models originally based on catalysis will be highly 
biased towards mechanisms that utilize structural complementarity. If mere binding without catalysis 
is the endpoint, then the structural requirements for the interaction become much more relaxed. 
Recent observations on specific examples suggest that this relaxation can reach an extreme lack of 
specific 3D structure, leading to molecular recognition with biological consequences that depend not 
only upon structural and electrostatic complementarity between the binding partners but also upon 
kinetic, entropic, and generalized electrostatic effects. In addition to this discussion of binding without 
fixed structure, examples in which unstructured regions carry out important biological functions not 
involving molecular recognition will also be discussed. Finally, we discuss whether 'intrinsically 
disordered protein' (IDP) represents a useful new concept. 



Introduction 

Preparations of the enzyme emulsin cleave (3-glycosides, 
but not a-glycosides, while preparations of invertase 
cleave a-glycosides, but not (3-glycosides. From these 
observations, Emil Fischer suggested in 1894 that the 
enzyme and substrate exert a mutual effect on each other 
like a lock and key [ 1-2] . Thus, the lock-and-key hypothesis 
was originally derived from studies on catalysis, not 
molecular recognition. 

A short time later, in 1897, Paul Ehrlich applied the lock- 
and-key hypothesis to the problem of how an antibody 
binds specifically to a particular antigen [3-4]. In this 
example, the interaction results in specific binding, not 
catalysis. Thus, Ehrlich converted the lock-and-key 
hypothesis from explaining enzyme specificity to explain- 
ing protein-based molecular recognition. 



However, the lock-and-key model gives rise to some 
interesting questions. For enzymatic transfer of a 
phosphate from ATP to an -OH acceptor via a lock- 
and-key mechanism, why doesn't an -OH from water 
simply outcompete the -OH from the acceptor, thus 
leading to ATP hydrolysis instead of phosphate transfer? 
Daniel Koshland suggested that sequestering the reac- 
tants from water via small conformational changes that 
he called ''induced fit" would solve the problem, especially 
since water itself would be too small to induce the needed 
conformational changes [5-6]. Induced fit is the formation 
of an encounter complex between molecules in conforma- 
tions distinct from their final conformations, followed by 
mutual structural adjustment until the intimate fit 
between the two partners is realized. Subsequent structural 
studies on a number of enzymes revealed substantial 
conformational changes upon substrate binding, which 
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are consistent with induced fit but could also be explained 
by other mechanisms [7]. 

Besides induced fit, an alternative model called 'con- 
formational selection' has been proposed to explain the 
association between a macromolecule and a flexible 
ligand [8]. In conformational selection, the ligand 
assumes an ensemble of conformations, and the protein 
binds to the conformation that gives the best fit to the 
binding site. Conformational selection and induced fit 
are often discussed as the two extreme possible mechan- 
isms for the binding to a flexible ligand such as an IDP to 
a structured partner. 

For enzymes, the lock-and-key and the induced-fit 
mechanisms lead to different kinetic equations, which 
can be considered as parts of a unified reaction cycle 
[9]. Analysis of enzyme kinetic data in terms of this 
unified reaction cycle suggests that substrate and 
enzyme concentrations determine whether conforma- 
tional selection, induced fit, or a mixture of the two 
underlies the reaction [10]. The key point here is that 
not only is the structure of the final complex important 
but also the mechanism used to achieve the final 
complex. 

As for non-catalytic molecular recognition by flexible 
proteins, many DNA-binding proteins contain regions 
that undergo disorder-to-order transitions upon bind- 
ing to their DNA partners [11]. Georg Schulz collected 
several of these examples and suggested in 1979 that 
such a disorder-to-order binding mechanism could be 
helpful for some biological interactions by enabling 
the combination of relatively high specificity and low 
affinity [12]. Thermodynamic studies of several 
protein-DNA interactions gave further support for 
such disorder-to-order transitions, in which the 
authors, Ruth Spolar and Tom Record, described the 
binding process as "coupled-binding and folding,'' or 
in one place "extreme induced fit" [13]. Just as for 
enzyme catalysis, it has been suggested that conforma- 
tional selection and coupled binding and folding 
(sometimes called induced fit) represent the extreme 
possibilities for binding, and just as for enzymes, it has 
been suggested that either mechanism or even mixtures 
of the two mechanisms may be involved for any 
particular interaction [14]. 

As with the lock-and-key hypothesis, induced fit was 
originally developed to explain specific features of 
enzyme catalyzed reactions, and this concept was later 
converted to a molecular recognition mechanism when 
the underlying ideas were applied to non-catalytic 
binding interactions. 



While enzyme catalysis very often occurs without 
significant disorder, for several enzymes disorder-to- 
order backbone rearrangements is a feature. In these 
cases, the substrate binds to part of the active site that is 
structured and then a disordered region folds onto the 
substrate typically including residues involved in the 
catalysis (and often to exclude water) and then the same 
region unfolds again to release the product [15-16]. 

Recent studies on enzyme reaction mechanisms using 
faster methods than previously available have revealed 
multiple steps in which conformational changes in the 
substrates are complementary to multiple conforma- 
tional changes in the enzymes. These multiple substeps 
involve small amounts of energy and only slight 
conformational changes, resulting in an overall better 
fit and larger energy decrease in the transition state 
energy than could likely be achieved by one large 
conformational change [7]. 

The importance of this background for the current 
discussion is that binding for the catalytic event is 
necessarily highly structured: the enzyme has to bind 
more tightly to a molecular intermediate form than to 
the undistorted ground state or to the product(s) [7,9], 
and so the physiochemical requirements for catalysis 
depend on interactions with exquisite steric and electro- 
static complementarity between the enzyme and the 
ligand. However, we argue that if the catalytic require- 
ment for the interaction is dropped and mere binding is 
the biological function, then the molecular recognition 
event itself ought to become more relaxed. The question 
here is whether molecular recognition in the absence of 
catalysis can become so relaxed that fixed 3D structure is 
no longer a requirement for binding. 

Effects on binding without specific 
structure formation 

Suppose an unstructured protein binds to a partner 
without forming structure. How would one evaluate 
such a complex in the absence of structure? How does 
one determine the biological significance of such an 
interaction if one were found? To address these issues, 
several examples are presented below in a progression 
from the most to the least amount of structure. 

In the first example, a region of intrinsic protein disorder 
binds to a partner via a disorder-to-order transition with 
part of the structure remaining unstructured. Such events 
are very common. In one study of short segments that 
bind to protein partners, out of 372 binding segments 
containing 10,434 residues, 13% of the residues 
remained unstructured after binding as determined 
from the lack of electron density in the crystal structures. 
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This is substantially higher than the 7% figure for 
disordered residues observed for a set of 848 structured 
protein monomers [17]. Are these disordered regions 
merely present or do they contribute to the binding free 
energy? 

In a few of the interactions containing IDP regions, 
studies have been carried out on the effects of removing 
all, or fractions, of the structurally unobserved regions 
(reviewed in [18]). Removal of such flanking disordered 
regions has been shown to result in both positive and 
negative changes in the free energy of binding (Fuxreiter, 
personal communication). 

In one case, the disordered splicing factor 1 (SFl) binds to 
the large subunit of the U2 small nuclear RNA auxiliary 
actor (U2AF), the SFl segment binds by a motif of 
10 residues with of 23.8 nM, and removal of residues 
not in physical contact with U2AF reduces the to 
55.6 nM. On the other hand, the full-length SFl binding 
with U2AF has a of 1 1 .8 nM, indicating that the flanking 
but unstructured regions contribute significantly to bind- 
ing free energy [19]. These data demonstrate binding 
energy without specific stmcture formation. Evidently, the 
steric complementarity required for enzyme catalysis has 
become considerably relaxed for mere association. 

In the case above, the disordered region exhibits a 
measurable free energy of association with the remainder 
of the protein without the formation of specific structure. 
There are at least three alternative mechanisms as 
discussed below. 

One possibility is that the unobserved residues alter the 
polypeptide in the unbound state. The removal of these 
residues would then affect the binding constant in the 
bound state. For example, if the binding mechanism were 
to depend on the conformational selection mechanism, 
then the unobserved residues could reduce the amount of 
time spent in a binding-competent conformation by a 
variety of mechanisms, thus reducing the on-rate, and 
lowering the overall affinity [8]. 

A second possibility is that there are several different 
conformations that enable the disordered protein to bind 
to the surface via short-range contacts. Having several 
such binding structures could result in missing structure 
in X-ray experiments due to incoherent scattering or to 
missing data in NMR experiments due to exchange 
broadening. Indeed, several examples of the same 
disordered region changing structure to bind to different 
surfaces have been observed [20-21]. All that would be 
needed is to have the alternative binding modes close to 
each other on the same binding surface. It could be 



argued that such a mechanism still uses structure for 
binding, just that there are multiple structures of similar 
energy that can interconvert over timescales required to 
collect either the X-ray or NMR data. We would give the 
counter argument that adopting multiple, different, 
interconverting structures is distinctly different from 
becoming structured upon binding. 

A third possibility is that electrostatic interactions could 
occur over long range without the formation of any 
specific structure at all. Such a result would truly be 
interaction without a specific complementary structure. 
Consideration of electrostatic interactions leads us to the 
second example. 

The second example involves the interaction between 
Sicl and CDC4. In this example, the Sicl protein is 
unstructured but contains nine similar motifs that are 
well separated along the sequence. Each of these motifs 
contains a central serine or threonine that can become 
phosphorylated [22]. Each phosphorylated Sicl motif is 
recognized by the CDC4 protein, which recruits Sicl for 
ubiquitination, thereby targeting it for degradation in 
late Gl phase, an event necessary for the onset of DNA 
replication [23]. Replacing different numbers of serines 
or threonines with alanines leads to loss of viability if 
fewer than six phosphorylation sites remain [22]. In vitro 
studies show that individual phosphorylated motifs bind 
weakly, but the affinity and the steepness of the binding 
isotherm increases as more sites are phosphorylated. 
When any six sites are phosphorylated, the affinity 
sharply increases and the binding isotherm is so sharp it 
resembles an on-off switch [22]. 

The CDC4 molecule contains just one site (!) that 
associates with the phosphorylated motif If there is only 
one binding site, how do the additional sites increase the 
affinity and sharpen the binding isotherm? The main 
effect seems to be electrostatic interactions that keep the 
phosphorylated motifs near to the single binding site, so 
that, as soon as one hops off, another is nearby to hop on 
[24]. A mathematical description of these effects has 
been developed, and the resulting mean field statistical 
mechanical model for the electrostatic interactions gives 
reasonably good agreement with the experimental data, 
including both estimates of the threshold number of 
phosphorylation sites for binding, and also including 
experimental affinities of CDC4 for Sicl fragments with 
different total charges [25]. 

The Sicl-CDC4 and the SF1-U2AF examples discussed 
above demonstrate that non-structured protein can 
contribute very significantly to binding free energy. As 
reviewed by Tompa and Fuxreiter [18] many additional 
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examples have been observed in which unstructured 
regions contribute both positively and negatively to the 
binding free energy between an IDP and its protein 
partner. We anticipate that similar results will be found 
for IDPs binding to nucleic acids if such examples have 
not been found already. These observations clearly make 
the point that unstructured regions of protein can 
contribute to binding free energy without becoming 
structured in the final complex. 

While the CDC4 has a single site for binding one 
phosphate group and the flanking amino acids on Sicl, 
which has multiple sites of phosphorylation along with 
sequence-similar motifs for the flanking amino acids in 
order to fit onto the binding site on CDC4, the 
mathematical model explaining the increased affinity 
arising from the remainder of the molecule does not 
require speciflc structure, merely electrostatic interac- 
tions between a dynamic IDP and its partner. The 
question then arises whether such non-specific interac- 
tions could lead to speciflc protein association without 
the formation of any long lasting complementary 
interfaces between an IDP and its partner. In other 
words, is there a mechanism by which an IDP could bind 
to a partner without itself forming speciflc structure? 

Binding without structure formation: 
is it possible? 

The various experimental observations above could be 
combined to yield a model with an overall electrostatic 
attraction between an IDP and its partner, coupled with 
several local docking interactions that rapidly convert 
from one to another. Such an interaction mechanism 
could lead to a speciflc association between an IDP and 
its partner without the formation of stable structure. 

Let's consider this possibility from a more traditional view. 
Protein complex formation typically involves at least two 
steps. Upon meeting, an encounter complex is formed, 
either proceeding towards the flnal complex or towards 
dissociation. Evidence suggests that encounter complexes 
are dominated by electrostatic forces, but hydrophobic 
interactions can also play a role [26]. An interesting variant 
is to consider the subsequent events in terms of game 
theory, according to which the interacting partners 
continually affect the conformational landscapes of each 
other in such a way that consecutive steps depend on prior 
steps until the flnal complex is formed [27,28] . What if the 
folding funnel for the overall complex were rather flat, 
with no energy minimum corresponding to one speciflc 
structure? In this case, the moves and counter-moves 
would continue endlessly, leading to a long-lived, 
dynamically fluctuating encounter complex that, of 
course, could dissociate at any moment. Since there are 



data supporting the existence of encounter complexes, the 
question then becomes whether it is possible for 
encounter complexes to be long-lived. 

The above discussion has an interesting parallel to the 
molten globule concept. The capability of a polypeptide 
chain to adopt partially folded intermediates was flrst 
proposed as a part of the frame-work model of protein 
folding, with a collapsed, but internally dynamic, short- 
lived transient intermediate protein form in protein 
folding [29]. In later studies it became apparent the 
intermediate with a collapsed, but internally dynamic, 
structure was shown to be a stable form for some proteins, 
following slight structural destabilization using a variety 
of treatments [30]. Next, this intermediate was shown to 
be transiently populated at the early stages of the globular 
protein folding [31]. Finally, certain proteins were 
suggested to form molten globules in their functional 
states [32]. So, will encounter complexes turn out to 
exhibit a similar progression, being first recognized as 
transients, then as stable forms under some particular 
conditions, then as stable forms under physiological 
conditions for some protein sequences? Time will tell. 

Has any protein-protein interaction without 
any structure formation ever been observed? 

A number of IDPs interact with each other to form 
dimers that sometimes exhibit very simple folds such as 
leucine zippers [33], and that at other times exhibit more 
complex folds such as helical bundles [34,35]. For a few 
examples, the IDP dimers are highly dynamic with only 
localized regions that show evidence of structure forma- 
tion [36,18]. All of these examples, more or less, flt the 
standard view of coupled binding and folding or 
induced fit. 

However, Sigalov and co-workers have reported homo- 
dimerization of the cytoplasmic region of the T cell 
receptor zeta subunit that is not accompanied by 
measurable shifts in the CD spectra [37] nor by measur- 
able chemical shift changes in the NMR spectra possibly 
suggesting association without structure formation [38]. 
Other explanations for this observation include the 
possibility that the CD spectra might not be sensitive 
enough to pick up formation of highly localized structure 
or technical problems such as exchange broadening for 
the same protein regions before and after dimerization 
could obscure the formation of structure during the inter- 
action, so further work needs to be carried out to prove 
that the molecular association is truly occurring without 
the formation of protein structure. Furthermore, the initial 
report was in 2004 and other laboratories have not yet 
reported similar flndings, and this absence of conflrmatory 
data adds uncertainty to Sigalov's interpretation of his 
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observations. However, given the results for a number of 
complexes in which IDP regions have been shown to 
contribute to the overall free energy of the protein- 
protein interaction [24,25,39,40] and given evidence for 
the existence of encounter complexes along with the 
various models to explain their interaction without 
structure [26-28], it is our opinion that Sigalov s results 
cannot be dismissed out of hand. While our own view is 
that specific protein-protein interactions likely require 
at least some localized structure at a key contact point 
such as observed for the Sicl-CDC4 interaction, forma- 
tion of protein complexes with highly transient structure 
formation ought to be taken as a possibility until ruled 
out by further studies. 

Flexible linkers, flexibility, and 
molecular recognition 

Direct involvement in molecular recognition is not the 
only type of biological function carried out by proteins. 
In addition, disorder can affect molecular recognition 
without direct involvement in the binding interface. Two 
aspects will be discussed here: flexible linkers and free 
energy in the unbound state. These examples emphasize 
that the final 3D structure of the complex is not the only 
biologically important aspect of molecular recognition, 
and that the on-rates, off-rates, and conformational 
changes enabling association and dissociation can also 
contribute to biological function. 

Flexible linkers can enable combinations of interactions 
that would not be sterically allowed by completely rigid 
structures. For example, calmodulin uses a flexible linker 
between two domains to wrap around its target helix 
[41]. Formation and dissociation of the calmodulin- 
target complex would simply not be possible without the 
flexibility of the linker. Multiple zinc fingers connected 
by flexible linkers enable certain transcription factors to 
wrap up their target DNA molecules [42]. Again such 
complexes could not form or dissociate without the 
flexibility of the linkers. 

Flexible linkers can also affect rates of association and 
affinities. A particularly interesting example is provided by 
voltage gated ion channels. Such channels exhibit three 
states: closed (voltage sensitive), open, and inactive 
(voltage insensitive). While in the open state, ion flow 
through the open channel collapses the cell membrane 
potential. Thus, the amount of time spent in the open state 
is important for the biological function of the channel. The 
mechanism of closing is via a 'ball and chain , where the 
terminus of a disordered region closes the open channel 
by a binding event [43-44]. Lengthening of the 'chain 
slows the closure, shortening the chain speeds it up 
[45,46], suggesting the possibility that the ball undergoes a 



random-walk search for binding site, but more detailed 
studies are needed to confirm this model [47]. Comparing 
the orthologous voltage channels in sperm from different 
mammals shows that the exon corresponding to the chain 
region has significant length variability that arises from 
insertions and deletions (indels). The number of indel 
substitutions is 5 to 8 times higher than is generally 
observed in genomic studies and indels within the 
disordered regions are considerably longer than average 
indels, suggesting that positive selection is occurring [48]. 
The authors suggest that the observed chain-region length 
variability, which can affect sperm motility, may be an 
important determinant in sperm competition, thus 
accounting for the positive selection. 

Another very interesting example is provided by the 
entirely unstructured ^200 residue kinase inhibitor 
p2ykipi j^^j^ which plays a key role in the control of 
eukaryotic cell division by the inhibition of several 
different cyclin-dependent protein kinases (Cdks) 
[50,51]. For one such complex, ~70 residues of p27^^^^ 
wrap around the outside of a dimer of a cyclin and its 
cognate Cdk [52]. The flexibility of p27^'P^ allows it to 
associate and dissociate segmentally [50,53], thereby 
providing opportunities for regulation and control. More 
speciflcally, segmental dissociation enables phosphory- 
lation of p27^^P^'s Y88 by a non-receptor tyrosine kinase, 
leading to the exposure of the Cdk's active site, which 
then phosphorylates p27^^P^'s T187. This second phos- 
phorylation provides a signal for ubiquitination, which 
then leads to digestion of p27^^^^ via the proteasome, 
which in turn promotes cell cycle progression [50,51]. 

This pathway of phosphorylation leading to ubiquitina- 
tion, in turn leading to degradation, is very commonly 
observed in eukaryotic cells as a means to remove or 
deplete the levels of key regulatory proteins [54]. Both 
phosphorylation and ubiquitination commonly occur in 
regions of disorder [55-57], and having a sufficiently 
long region of disorder appears to be important for entry 
into the proteasome [58]. 

As a flnal example of the consequences of flexible linkers 
on molecular recognition, Kuriyan and Eisenberg [59] 
argue that the proximity brought about by flexible 
linkers brings about an ampliflcation of the effects on 
one domain of random mutations in the colocalized 
domain. Through natural selection, this ampliflcation of 
effects by proximity leads to speciflc interactions and to a 
startling variety of complex allosteric controls. 

As for effects arising from the free energy of the unbound 
state, the unbound, flexible state is the starting point for 
the many examples involving disorder-to-order transitions 
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upon binding to their partners, which in turn might be 
structured or also disordered. While structure accounts for 
the final recognition, the rate of association or dissociation 
might also be important for biological function: muta- 
tions that do not involve any of the interacting residues 
but that affect the free energy in the unbound state would 
affect the final binding constant [60] . Such an alteration in 
free energy by mutation could also be viewed as an 
alteration in the underlying protein ensemble. This 
ensemble view was recently used to explain mutational 
effects on binding events that lead to allostery [61,62]. In 
this view, rather than affecting a Rube-Goldberg type 
pathway underlying allostery, the mutation could be 
affecting the conformational ensemble and hence the 
allostery. In a recent study, the binding constants of 
associations between structured proteins were shown to 
have a correlation with the measured off-rates and to be 
rather independent of on-rates. On the other hand, 
binding constants of associations involving an initially 
disordered protein were shown to have a correlation with 
the on-rates [63]. A possible mechanism here is that a 
reduced free energy in the unbound state (as compared to 
the bound state) would be expected to both reduce the 
final affinity and to slow the on-rate. These disorder- 
dependent effects on binding kinetics and affinity values 
don't directly involve the molecular details of the bound 
state, but are nevertheless likely to lead to important 
biological consequences. 

One-to-many binders and multifacial complexes 

IDPs are known to participate in one-to-many and 
many-to-one interactions, where one IDP or one intrinsi- 
cally disordered region (IDR) binds to multiple partners 
potentially gaining very different structures in the bound 
state, or where multiple unrelated IDPs/IDRs bind to one 
partner, potentially gaining similar structures in the bound 
state [20,21,60]. 

The one-to-many binding mechanism is especially 
interesting since it might generate multifacial complexes, 
where the same region of an IDP can be engaged in 
interaction with multiple unrelated partners and be able 
to fold into very different conformations in the bound 
state. One of the illustrative examples of such one-to- 
many binders is p53, a single C-terminal region, which is 
known to interact with at least four different partners 
[20]. The amino acids involved in each interaction show 
a significant overlap and no two of these interactions 
could exist simultaneously. Furthermore, the same 
residues adopt helix, sheet, and two different irregular 
structures when associated with the different partners. 
Finally, the same amino acids are buried to very different 
extent in each of the molecular associations [20]. These 
results show that one of the functional advantages of 



IDPs/IDRs over ordered proteins and domains is the 
ability of one disordered segment to bind to multiple 
partners due to its ability to adopt different conforma- 
tions in the bound state. 

Recent analysis revealed that the C-terminal recognition 
domain of p53 is not a unique entity and several other 
IDPs can be engaged in the formation of multifacial 
complexes [21]. These examples highlight the transient 
nature of the intrinsic disorder-based interactions and 
emphasize the extreme adaptability of IDPs. In general, 
complexes involving disordered proteins are drastically 
different from the complexes formed by ordered proteins. 

The case of the elastomeric proteins 

An important example of biologically important yet 
disordered complexes is that of the elastomeric proteins, 
which have a wide range of crucial functions and are 
involved in various unique mechanisms where they 
provide the high efficiency elastic recoil necessary to 
undergo reversible deformation [64]. These proteins are 
found in the human arterial wall, the capture spiral of 
spider webs, the hinge of scallop shells, and are involved 
in the jumping mechanism of fieas. Since it has been 
suggested that the elastic recoil of proteins is due to a 
combination of internal energy and entropy, and since 
the dominant driving force in this recoil process is the 
increased entropy of the relaxed state relative to the 
stretched state, it was pointed out that intrinsic disorder 
plays a crucial role in the function of these rubber-like 
elastomeric proteins. In agreement with this hypothesis, 
the functional aggregates of these proteins were 
described as intrinsically disordered or fuzzy complexes 
with high polypeptide chain entropy [64]. Although 
these disordered elastomers possess a broad range of 
sequence motifs, mechanical properties and biological 
functions, all of them are dramatically enriched in 
proline and glycine residues [65]. This P and G 
enrichment plays a central role in defining the elastin- 
like properties of disordered elastomers that form 
disordered fianctional aggregates (including disordered 
fibers), and clearly separates all elastomeric proteins 
from the amyloidogenic proteins/peptides that form 
insoluble amyloid-like fibrils characterized by the cross- 
(3-sheet structure with the (3-strands perpendicular to the 
fiber axis. Figure 1 illustrates this observation by showing 
a two-dimensional diagram that relates the P and G 
contents of natural elastomeric protein domains and 
proteins that were experimentally shown to form amyloid 
fibrils [65]. In this plot, a clear separation is seen between 
elastomeric and amyloidogenic sequences. These data 
provide a clear explanation of why elastomeric proteins 
are expected to form disordered fibers: their amino acid 
sequences are enriched in the structure-breaking G and P 
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Figure I . A two-dimensional plot correlating proline and glycine content for a wide variety of elastomeric and amyloidogenic peptides 
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the frog Notaden bennetti, the PEVK domains of titin, wheat glutenin protein, and the strongest spider silks, namely MaSpl and minor ampullate spindroin 
(MiSp). Figure reproduced from [65]. Abbreviations: AcSp, aciniform silk; MaSp, major ampullate spindroin; MiSp, minor ampullate spindroin; TuSpl, 
tubulliform silk. 



residues and therefore are naturally selected not to form 
lengthy ordered segments. 

The *IDP' concept: necessary or not? 

To our knowledge, the first report of a significant-sized 
region of missing electron density was in the structure of 
the extracellular nuclease from Staphylococcus aureus, as 
described in 1971 [66]. Two such regions were observed 
and the authors suggested that these were ''disordered'' 
and both highly solvated and highly mobile. The authors 
also reported the extreme trypsin sensitivity of these 
regions. Even earlier, optical rotatory dispersion was 
used to identify a few proteins that appeared to be fully 
unstructured under apparently physiological conditions, 
and from such studies one author suggested that there is 
a category of "disordered proteins" [67]. 



Subsequently many regions of proteins have been found 
to lack 3D structure under apparently physiological con- 
ditions in the absence of a binding partner, and NMR has 
revealed many additional proteins that appeared to be 
entirely unstructured. Release 6.0 of DISPROT [68] lists 
667 proteins with 1,467 disordered regions that are 
associated with biological function; this set includes 112 
wholly disordered proteins. 

When such disordered proteins and regions were first 
discovered, the standard view was that they were some- 
how likely to be structured, except that they were denatured 
during isolation or lacked a critical partner that got lost 
during isolation. Indeed, some of the scientists who were 
involved in carrying out early, key work on these proteins 
[69,70] tell us that when they were graduate students 
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doing the work in the citations just given, they repeated 
protein purification multiple times using different proto- 
cols because neither they nor their advisors could believe 
that unstructured proteins could be canying out the bio- 
logical functions being observed (Daughdrill, Kriwacki, 
personal communications). 

A key development in the study of these proteins in our 
view has been the development of disorder predictors 
that used amino acid sequence or composition as inputs 
[71,72]. These predictors give results much better than 
expected by chance, leading to the conclusion that, to a 
considerable degree, lack of structure is encoded by the 
amino acid sequence. In other words, disordered proteins 
and regions have amino acid compositions that are distinc- 
tly different from the compositions of structured proteins. 
Thus, this observation links disorder to the DNA sequence, 
leading to an extension of the standard Central Dogma. 
That is, the standard Central Dogma is given by the follow- 
ing steps: (1) DNA sequence, (2) RNA sequence, (3) Protein 
sequence, (4) structure, and (5) function. The extension, 
however, is given by the following steps: ( 1 ) DNA sequence, 
(2) RNA sequence, (3) Protein sequence, (4) intrinsically 
disordered ensemble, and (5) function. 

In our view, a portion of 'folding code' (and sometimes a 
significant part of it) that defines the ability of ordered 
proteins to spontaneously gain a unique biologically 
active structure is missing for IDPs. This missing portion 
of the 'folding code' (or a part of it) can be supplemented 
by binding partner(s). As a result, a key difference 
between structured and disordered proteins is that the 
former fold first and then bind to their partners while the 
latter remain unfolded until they bind their partners. 
Other researchers suggest that this distinction makes no 
difference. To emphasize that this is a distinction 
without a difference, recently the term "proteins waiting 
for partners" (PWPs) was proposed as an alternative to 
the term "disordered" [73]. Above we describe many 
examples in which disordered proteins have functions 
other than partner binding, so this term cannot be used 
for all types of disordered protein. Also, it is not clear 
how the PWPs concept would apply to examples such as 
the C-terminus of p53 described above, in which the 
same disordered region assumes four different confor- 
mations when binding to four different partners. This 
example suggests the possibility that the same disordered 
region can switch from one partner to another, with the 
disordered region changing its shape as it changes its 
partner. This sort of behavior seems to be much more 
dynamic than just 'waiting for a partner'. 

'Flexibility' is often proposed to describe motions in pro- 
teins covering both folded and unfolded forms. When we 



started studying these proteins, one of us chose the 
descriptor 'natively unfolded protein' [72,74] and the 
other one of us chose 'disordered protein' [71]. Both of us 
considered but rejected 'flexibility' as a descriptor. Our 
views regarding 'flexibility' were that this term is applied 
to both structured and unfolded proteins but describes 
entirely different processes for the two protein forms. That 
is, for structured proteins, flexibility refers to periodic or 
slightly aperiodic motions as atoms oscillate about their 
equilibrium positions, with higher flexibility referring to 
larger amplitudes for the oscillations. During these oscilla- 
tions, the overall shape of the molecule changes very little. 
On the other hand, for unfolded proteins, flexibility refers 
to massive changes in backbone and side chain dihedral 
angles, leading to large-scale changes in overall shape. 
Given that flexibility has entirely different meanings for 
structured and unstructured proteins, using the same term 
for both protein forms tends to blur the very large differ- 
ences in behavior. 

With regard to replacing disorder with either flexibility or 
PWPs, our view on these suggestions can be summarized 
by a well-known phrase: "What's in a name? That which we 
call a rose by any other name would smell as sweet". [75]. 

The fundamental distinction between folding first and 
then binding as compared to concomitant binding and 
folding is reflected by the marked differences in the 
amino acid composition between the two types of 
proteins. Indeed, early theoretical studies on protein 
folding suggested that whether a protein folds or not 
depends on its amino acid composition, and if it has a 
composition commensurate with folding, then the 
sequence patterns determine which fold is favored [76]. 
We [72,77,78] and others [79] have pointed out that the 
amino acid compositions of IDPs are entirely consistent 
with their lack of folding. The high polarity of these 
sequences is very much along the lines of the suggestions 
from the early theoretical studies for sequences that 
would fail to fold [76]. 

Researchers who question the existence of IDPs in vivo 
often point out that cells have elaborate mechanisms to 
deal with misfolded proteins and that disordered 
proteins would be cleared by these mechanisms. Thus, 
they conclude that disordered proteins cannot exist in 
cells except transiently. In our opinion, such suggestions 
are misguided for three reasons. First, the misfolded 
protein response is confined to the endoplasmic 
reticulum, so it is unclear to what extent other parts of 
the cell are under surveillance for protein misfolding. 
Second, such suggestions are based on the assumption 
that an amino acid sequence that is commensurate with 
folding (e.g. a high level of hydrophobic groups and 
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aromatics) but that is unfolded or misfolded and that a 
highly polar sequence that has evolved to be unfolded 
will both be readily recognized by the unfolded protein 
response systems' and cleared by the cell. We think that it 
is equally likely that IDPs and regions have evolved to 
avoid the unfolded protein response by having sequences 
not recognized by those systems. Where is the proof that 
these systems recognize all types of sequences? Indeed, 
the mechanism by which the unfolded protein response 
recognizes its substrate proteins is currently ambiguous, 
and whether, or which, IDPs are cleared by this system is 
not yet understood (Ron Wek, personal communica- 
tion). Also, in-cell NMR data demonstrates the existence 
and stability of IDPs even when inside both prokaryotic 
and eukaryotic cells [80-90]; why doesn't the misfolded 
protein response rapidly remove these proteins? Third, 
to use humans as an example organism, some of our 
proteins turn over with half-lives of less than a minute 
while others exist for the life of the human, giving almost 
eight orders of magnitude difference in protein stability. 
The stability of each protein is an important aspect of 
its biology. Studies on the relationship between protein 
lifetimes and protein disorder suggest that some dis- 
ordered proteins have long lifetimes, but, on the other 
hand, the short-lifetime proteins are rich in disorder [91]. 
Perhaps both a disordered region and a particular signal, 
such as the PEST motif [92,93], are needed for a protein 
to exhibit a short half-life. The important point here is 
that disordered regions likely help to promote short 
protein life-times as an important aspect of this biology 
or to put it another way, life-time modulation is an 
important biological function of disordered proteins. 

Is there any evidence that disorder is a product of 
evolution? Studies on the evolution of structured proteins 
suggest that regions of proteins with a high packing density 
show fewer amino acid changes over evolutionary time as 
compared to regions with lower packing densities. That is, 
if positional mutation rates (expressed as Shannons 
entropy) are plotted versus tightness of packing (expressed 
as 1/density), virtually a straight line is observed with 
lower packing densities showing higher sequence varia- 
bility (see Figure 3 in [94]). Of course many years earlier it 
was pointed out that the residues in the core of a protein 
family exhibit fewer mutations over evolutionary time as 
compared to residues on the surface of the same proteins 
[95]. Thus, mutation rates are strongly correlated with 
structural features of proteins. If the mutation rates of the 
structured and disordered regions of proteins are com- 
pared, in general (but not always), the mutation rates of 
the disordered regions are much higher than the mutation 
rates of the structured parts of the same proteins [96-98] . In 
our view, these observations are simply explained assum- 
ing that disordered proteins evolve differently from 



ordered proteins to maintain their disordered structure 
under physiological conditions, which is necessary for 
their functions. 

Until recently, disordered proteins or regions were 
largely ignored. However, each week there are now 
about 17 publications (estimated by Caron Morales 
from the last 10 weeks using DisProt's standard keyword 
search of PubMed) that focus on the characterization and 
functions of these proteins. It could be reasonably argued 
that there is nothing new in the IDP concept, that all of 
the current views of these proteins follow naturally from 
long-held views of protein structure and function. From 
a chemistry and physics point of view, that is certainly 
true. However, the fact that these proteins were largely 
ignored previously and now they are being actively 
studied suggests that developing the IDP concepts has 
served the useful purpose of bringing attention to these 
proteins and to understanding the biological functions 
with which they are involved. The reader is invited to 
make up his or her own mind regarding the utility, or 
lack thereof, of the IDP concept. 
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